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Abstract 

The solution of linear inverse problems when the unknown parameters outnumber data requires addressing 
the problem of a nontrivial null space. After restating the problem within the Bayesian framework, a priori 
information about the unknown can be utilized for determining the null space contribution to the solution. More 
specifically, if the solution of the associated linear system is computed by the Conjugate Gradient for Least 
Squares (CGLS) method, the additional information can be encoded in the form of a right preconditioner. In 
this paper we study how the right preconditioned changes the Krylov subspaces where the CGLS iterates live, 
and draw a tighter connection between Bayesian inference and Krylov subspace methods. The advantages of 
a Krylov-meet-Bayes approach to the solution of underdetermined linear inverse problems is illustrated with 
computed examples. 

Key words: Underdetermined linear system, iterative linear solvers, Bayesian inverse problems, termination crite¬ 
rion, effective null space. 


1 Introduction 

Severely underdetermined inverse problems in which the dimensions of the unknown greatly outnumber the data are 
commonly encountered in applications, and the challenge is to introduce in a numerically efficient way additional 
and complementary information to overcome the problem of scarcity of the data. The Bayesian statistical framework 
provides a systematic way to augment the observation model with prior information |[T^ l28l 1411 l42ll ; however, the 
implementation of effective algorithms for computing informative pointwise estimators pose a challenge. In this 
paper we are interested in solving severely underdetermined linear inverse problems with the Conjugate Gradient 
for Least Squares (CGLS) method equipped with a suitable stopping rule. In particular, we want to analyze how 
the Krylov subspace method is affected when we address the non-uniqueness of the solution within the Bayesian 
statistical framework. This work is, in particular, motivated by the interest in large-scale underdetermined linear 
inverse problems that arise in important biomedical applications, such as electrical impedance tomography (EIT) 
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ifTSll and magnetoencephalography (MEG) ||2l, in which the CGLS method combined with Bayesian models has 
proven to be particularly efficient, see, e.g. ll2^[^IT7l . 

To introduce the setting, assume that we want to estimate a vector x G M” from the observation of the vector b G 
satisfying 

b = F{x)+£, (1) 

where m < n, and e represents additive noise. In the Bayesian setting all unknowns are modeled as random variables 
|[T2]| described by their probability distributions that we assume here to be expressed in terms of probability density 
functions (pdf). Denote by vTnoise the pdf of the noise and let the prior density vTprior encode the information about x 
prior to considering the observations. If x and e are statistically independent, the posterior density of x conditioned 
on the observed value of b is, according to Bayes’ formula, given by 

7r(x I 6) OC TTprior(3^)'^noise^ — ^observed; (2) 

where “oc” stands for “proportional up to a normalizing constant”. Here we concentrate on observation models that 
are liner, that is, F{x) = Ax, where A G is a given matrix. The approach that we analyze carries over to 

non-linear problems through a sequence of local linearizations, as has been demonstrated in several articles: see, 
e.g., |[Il|5l|29l|30l|3l[35l. For the sake of clarity, we limit our discussion to the case of Gaussian distributions. The 
results of our analysis can be extend naturally to hierarchical Bayesian models and furthermore to non-Gaussian 
problems, through distributions belonging to exponential family; however, the extension will be addressed in a 
separate contribution. 

An important application that motivates this work and gives an idea of the dimensions of the problems that we have 
in mind is the MEG inverse problem, in which the aim is to estimate electric activity of the brain by measuring the 
weak magnetic fields outside the skull. This is a classical inverse source problem for Maxwell’s equations and can 
be described in terms of a linear mapping: x G R^ represents the discretized sources inside the head, b G are the 
magnetometer recordings outside the head, and A G is the lead field matrix. In a typical MEG problem, m 

represents the number of channels in the device, and is of the order m ~ 100 — 300, while the number of unknowns 
is typically orders of magnitude larger, from tens of thousands upwards. In particular, since m <C n, a significant 
amount of additional information is needed to find a reasonable solution. 

In this paper we assume that the system ([T]l of linear equation that arise from the discretization of the underlying 
linear inverse problem is solved using a Krylov subspace iterative method, and in order that such a solution is 
representative in view of the Bayesian model Q, additional prior information is embedded in the algorithm via a 
right preconditioning matrix. If the linear system is not square, as is the case for the problems of interest to us, 
the Krylov subspace iterative solver of choice is CGES, originally proposed in |[26l . The analysis of the effects 
of a statistically inspired right preconditioner on the Krylov subspaces is carried out in detail here only for the 
CGES method, but analogous arguments and techniques can be used to extend the analysis to other Krylov subspace 
iterative methods. 

Originally preconditioners for linear systems were introduced to increase the convergence rate of iterative methods, 
a feature that made them not well suited for the solution of linear discrete inverse problems, because the speed¬ 
up may also increase the rate at which the amplified noise components contaminate the solution. In fact, a fast 
converging CGES iteration often returns a solution that satisfies the data adequately without capturing important 
features of the solution. To overcome this problem, special preconditioners for ill-posed problems were proposed 
in, e.g., Il24l l8l. The idea in this class of regularizing preconditioners was to accelerate the convergence rate of the 
portion of the solution in the subspace of the signal, while leaving the noise subspace unpreconditioned, a task that 
could be achieved with a preconditioning matrix approximating the inverse of A in the signal subspace and acting as 
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an identity on the noise subspace. Constructing such matrix would require a priori knowledge of the signal and noise 
subspaces. In the case of underdetermined ill-posed systems, preconditioning can be used to enrich the computed 
solution, albeit at the cost of slowing down the rate of convergence of the iterative method. 

In general, the solution of a linear system can be decomposed as the sum of a visible component which contributes 
to the data, and an invisible component, that belongs to the null space of the matrix. Effectively, the CGLS method 
equipped with a suitably chosen right preconditioner can leverage rich prior information about the solution to extract 
a significant component of the solution from the effective null space of the matrix A. The coupling between the 
visible and invisible part of the solution happens through non-orthogonal subspace decompositions based on the 
prior information implicitly applied by the iteration method. One of the aims of this work is to understand and 
quantify the changes to the fundamental subspaces of the linear system induced by the right prior conditioners, 
information that can be subsequently used in the design and selection of statistically inspired right preconditioners. 

The paper is organized as follows. In Section|^ we review some basic facts on linear inverse problems and regular¬ 
ization, including the standard and preconditioned CGLS as a regularization method. Using subspace decompsitions, 
we then analyze how the priorconditioning alters the fundamental subspaces and the induced effect on the solutions. 
We also discuss the convergence rate of the priorconditioned system through the Lanczos process. Finally in Sec¬ 
tion we elucidate the ideas with two computed examples. 


2 Linear inverse problems: Tikhonov vs. Krylov-subspace regularization 


Consider the discretized linear version of ([T]), 

b = Ax -|- £, (3) 

where A G m < n and e represent the noise due, e.g., measurement errors and model uncertainties. Adhering 

to the paradigm of Bayesian inference, all unknowns are modeled as random variables and are described in terms 
of their probably density functions. The belief about the unknown x prior to taking the data into consideration is 
expressed by its prior density, that here we assume to be a Gaussian density with mean zero and symmetric positive 
definite covariance matrix C G Furthermore, we assume that the noise vector e is a zero mean white Gaussian 

random variable. More general Gaussian noise can be reduced to this case multiplying both sides of the linear system 
by a suitable invertible matrix, a process known in signal processing as whitening. It follows from Bayes’ formula 
Q that the posterior density of x, which is the solution of the inverse problem in the Bayesian setting, is given by 


'7r(x I b) oc exp 



||Ax-6f 



(4) 


Consider a symmetric decomposition of the inverse of the covariance matrix, or the precision matrix, of the form 


C-i = 


(5) 


whose existence is guaranteed by the positive definiteness of C; here B can be, e.g., a Cholesky factor, or the square 
root of the precision matrix. By using the factorization Q, the negative logarithm of the posterior density Q, also 
known as Gibbs energy, can be written as 

G(x) = ||Ax — b\\'^ + ||Bx||^ 


A 

B 


X — 
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from which it follows that the maximizer of Q, known as the maximum a posteriori (MAP) estimate of x, can be 
found by solving the system 


A 

B 


X = 


b 

0 


in the least squares sense. The MAP estimate is also the solution of the linear system 

(A'^Aa = AJb, 

which are the normal equations for the penalized least squares problem 

X = argmin{||6 — Ax|p + A||Bx||^} 


( 6 ) 


associated with Tikhonov regularization with linear regularization function B and regularization parameter A equal 
to unity ifT^ . The standard version of Tikhonov regularization assumes that B = 1^, the unit matrix. The solution of 
([^ can be computed by first transforming the problem into standard form, an operation that requires the generalized 
singular value decomposition (GSVD) of the matrix pair (A, B): see, e.g., ||2T1, as we will review below. Strategies 
to reduce the high computational costs associated with the computation of the GSVD in the context of Tikhonov 
regularization have been proposed in ifTSl . Several numerical methods for the computation of Tikhonov regularized 
solution for large problems can be found in the literature; see, e.g., ||6l|71|9l. The determination of a suitable value 
for the regularization parameter for Tikhonov in standard form has been studied extensively in the literature, where 
it has been related to the amount of error in the right hand side. In the case where Tikhonov regularization operator 
is different from the identity it is not obvious how to determine the value of the regularization parameter without 
transforming the problem to standard form. 

When the unknown x has a large number of components, a computationally attractive alternative to Tikhonov reg¬ 
ularization with regularization operator B is to use a Krylov subspace iterative methods, equipped with a suitable 
stopping rule, to solve approximately the linear system 


Ax = b, (7) 

with the matrix B as a right preconditioner H [TTl as explained below. In the case where the matrix A is non square, 
a natural choice is to use the CGLS method. The CGLS method determines a sequence of approximate solutions 
of the linear equations associated with the linear system Q without explicitly forming the matrix A^A, but instead 
multiplying vectors with A and A^ separately. 

Iterative solution methods for large scale non square linear systems have been studied extensively, and different 
implementations of the idea behind the CGLS method which take into account the characteristics of the problem 
have been proposed in the literature. For a discussion of the different implementations and guidelines their relative 
advantages see, e.g., iS|33l[ill|3ll|39l . 

The stopping rule is an important component of iterative linear systems solvers. In general, the iteration stops when 
the norm of residual error is sufficiently small. It has been shown that when the linear systems is the discretization 
of a linear ill-posed problems, the CGLS method with a suitably modified sfopping rule is a regularizafion mefhod. 
In fhe nexf subsecfion we recall a few resulfs abouf Krylov subspace regularization for underdefermined linear 
systems, and we compare fhe regularized solufions compufed by CGLS in sfandard form and CGLS wifh a righf 
preconditioner. 
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2.1 Standard and priorconditioned CGLS 

The regularizing properties of the CGLS method equipped with a suitable stopping rule are well known: see, e.g., 
[211221123. Most of the studies of the properties of CGLS method as a regularization scheme have been 
carried out for overdetermined linear systems, where the number of equations exceeds the degrees of freedom of the 
problem. While a lot of the results carry over to the case of underdetermined problems, we will see below how the 
presence of a null space, which in some cases may be very large, changes the picture. 

The CGLS method for solving the linear system (|7]l starting with the initial approximate solution xq = 0 computes 
a sequence of approximate solutions xi,... until a termination criterion is satisfied. If the matrix A is ill- 
conditioned and the right side noisy, as is typically the case when 0 is the discretization of a linear inverse problem, 
the iterations are stopped as soon as the norm of the discrepancy, dk = b — Ax^ falls below a threshold value 
corresponding to the noise level in the data. The CGLS iterates are computed by projecting 0 onto a nested family 
of Krylov subspaces. More precisely, the jth iterate satisfies 

Xj = argminj ||Ax — 6|| | x G K,j{k^b, A'''A)} 

where 

/Cj(A^5, A^A) = span{A^6, (A^A)A^6,..., [k^ky-^k^b] (8) 

is fhe jfh Krylov subspace associafed wifh fhe mefhod. From fhe assumption fhaf fhe noise e G is addifive, zero 
mean while Gaussian if follows fhaf 

E{||ef } = m. 

If fhe rank of fhe Krylov subspaces continues fo increase, fhe iferafion is slopped as soon as 

IIAxfc - 6||^ < rm, 

where r > 0 is a safeguard factor, e.g., r = 1.2, and x^ is fhe regularized solufion defermined by fhe mefhod. 
Typically fhe stopping index k is much smaller lhan m. This Iasi observafions implies fhaf when using fhe CGLS 
mefhod for fhe solufion of discrele inverse problems, if is imporlanl fhaf fhe salienl fealures of fhe solufion veclor 
are included in fhe firsl few ilerales already, because if may require only a few iferafion sfeps lo salisfy fhe stopping 
crilerion. 

In general, fhe presence of a non-lrivial null space for fhe malrix A raises fhe need fo address how lo deal wifh 
fhe degrees of freedom in fhe solufion fhaf cannol be defermined by fhe dafa. The slandard CGLS mefhod sels fhe 
null-space componenls of fhe compufed solulions to zero, seeking fo find fhe solufion of minimum Euclidean norm. 
This corresponds fo look for fhe maximum likelihood estimator. 

Wilhin fhe Bayesian framework, selling fhe null space confribulion lo fhe solufion lo zero is only jusfified if Ihere 
are reasons lo believe fhaf fhe solufion is orlhogonal to fhe null space. If fhaf is nol fhe case, fhe way to proceed 
is fo encode a priori informafion in fhe prior densily and have if determine fhe confribulion of fhe null space lo 
fhe compufed solufion. The encoding of fhe information carried by fhe prior into fhe CGLS ilerafions via a righl 
preconditioned was formalized in ['Tj and in ifTTI . where fhe ferm prior conditioner was firsl proposed, alfhough fhe 
use of smoofhing righl precondifioners for linear discrete ill-posed problems has also been advocated earlier in, e.g., 

nnnHiiiQi. 

Wilhouf loss of generalify, we can assume a priori fhaf x ~ AA(0, C). If B is fhe malrix defined in 0, fhe change of 
variables 

ty = Bx ~ AA(0, In.) 
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transforms x into a zero mean white Gaussian random variable, hence we can write the linear system (|7]l in terms of 
the new variable as 

AB-^w = b, (9) 

where 

X = B-^w. ( 10 ) 

The CGLS algorithm applied to the system 0 with starting vector tuo = 0 computes a sequence wo,wi,... of 
approximate solutions for Q and a corresponding sequence of vectors in the original coordinate system through 
( [To] ). It turns out that this simple change of variables determines a sequence of approximate solution which may 
be very different from that computed by the CGLS method applied directly to (0. To investigate the effect of the 
priorconditioning on the Krylov subspaces where the approximate solutions are computed in relation to the null 
space of the coefficient matrix, we need some linear algebra results. 

Consider the canonical orthogonal decomposition of the space of the solution into the null space of A and its orthog¬ 
onal complement, which is the range of A^, 

= ^(A)e^(A'^). ( 11 ) 

It follows from ( [IT] ) that the Krylov subspace ^ is orthogonal to ^(A), hence Xk -L c/L(A). In other words, all the 
CGLS iterates are perpendicular to the null space of A. 

Introducing the notation A = AB~^ to simplify the notation, we observe that the jth iterate of the whitened problem 
(0 solves the minimization problem 

Wj = argminjllAm — 6 || | ru G ICj{A'^b, A'^A)'j, 

and the corresponding jth Priorconditioned CGLS (PCGLS) solution xj = B~^Wj satisfies 

Xj G span{B“^(X''A)^A'''6 |0<£<j — l}. (12) 


It follows from 

that 


B'M' = B^^B^'A' = CA', 

B~i (I^A)^A^ = (CA^A)^CA^, 0 < ^ < j - 1. 


Therefore, in view of (fT^-([T3]), 


Xj G C(,yL(A)-‘-), 


(13) 


(14) 


and Xj is not necessarily orthogonal to the null space of A. This observation is discussed further in the light of 
subspace factorizations in the next section. 


2.2 Subspace factorizations 

To characterize the subspaces where the iterates of the PCGLS method are computed, we need some results about a 
generalization of the singular value decomposition. 

The generalized singular value decomposition (GSVD), first introduced in ll44ll and then described in ifTSll . is a 
standard tool to analyze regularized solutions produced by Tikhonov regularization with a regularization functional 
different from the identity ll2^l2^l^[T^ . In our analysis we resort to formulation proposed in 1321. 
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Theorem 2.1 Given a matrix pair (A, B) with A G B G m < n, it is possible to find a factorization 

of the form 

A = U [ Om,n-m ] X-\ B = V 

called the generalized singular value decomposition, where U G and V G are orthogonal matrices, 

X G is an invertible matrix, and Za G and Zb G are diagonal matrices. The diagonal entries 

s^\ ..., Sm^ and • • •, Sm^ of the matrices Za and Zb are real, nonnegative and satisfy 



*1 ^ *2 ^ ^ 

(B) (B) (B) 

*1 — *2 — • • • — 

+ (sf V = 1> 1 < j < rn. (15) 


.(A) 


It follows from the condition (15) that 0 < Sj < 1 and 0 < s 
called the generalized singular values of the matrix pair (A, B). 


(B) 


< 1. The ratios s^- /s^- for 1 < j < m are 


If A has full row rank, as we assume for simplicity here, the diagonal entries of Za are positive. 

Given a symmetric positive definite matrix C, the C-inner product, or prior energy inner product, of two vectors x 
and y is defined as 

{x,y)c = x'''C~^y. 

Two vectors x and y are C-orfhogonal, denoted by x _Lc U if and only if {x,y)c = 0. When x,y are random 
variables C-orfhogonalify corresponds fo independence of fhe Iransformed variables Bx and By (where fhe mafrix B 
is defined in Q). 

The following fheorem sheds some lighf on fhe sfrucfure of fhe null space of A in fhe GSVD basis. Given a mafrix 
Y, we use fhe nofafion span(Y) fo denote fhe subspace spanned by fhe columns of fhe mafrix Y. 


Theorem 2.2 If we partition the matrix X G in Theorem 2.1 as 

X = [ X' X" ] , X' G X" G 


it follows that 

c/G(A) = spanjX'}, 

and we can express M” as a C-orthogonal direct sum, 

M” = spanjX'} ©c span{X''| = ^(A) ©c spanjX''}. 


Proof. Let z G ^(A) and define 


From fhe observation fhaf 


C = x-^z = 


C' 

C" 


(f £ g Tn>n—m 


AZ = [)[ Qm,n-m ] C = UZaC" = 0, 
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the orthogonality of U, and the positivity of the diagonal elements in Za it follows that C,” = 0. This implies that 


z = XC = G span(X'), 


that is, 

c/K(A) C span(X'). 

It is easy to see, by following the argument backwards, that the inclusion holds in the other direction also. 
To show the C-orthogonal decomposition of the space M” in terms of the basis X, observe that 


C-i = = X-'^ 


In—m 




X- 


or, equivalently, 

C = (B'^B)^^ = X 

which is tantamount to saying that C is diagonal in the basis X. 

To show that 

spanjX'l _Lc spanjX''}, 

let X G spanjX'} and y G spanjX"}. Then 



x = X'a = X 


a 

0 


, y = X"/3 = X 


0 

/? 


therefore 


Tv-T 


{x,y)c =[«’’’ 0 ] X''’X 


5"2 


X-^X 


0 


= [«"'' 0 


as claimed. Hence, any vector x G M” can be expressed in the X-basis as 



(16) 


x = X 


a 

/3 


X'a + X"/3 


in a unique way, thus completing the proof. □ 

If follows immediately from the theorem above that 

= ^(A^), ^(A)^c = spanjX"}. 


This factorization allows us to interpret the effect of priorconditioning: According to ( [T4| ), we have 

Xj G C(,^(A'^)). 

In particular, if .^(A^) is an invariant subspace of the covariance matrix C, then the iterates Xj are orthogonal to 
the null space of A, and the PCGLS is not capable of informing the null space. In particular, in statistical terms, 
if the orthogonal projections of x onto the subspaces ^(A) and M^k^) are uncorrelated, then these spaces are 
C-orthogonal, which implies that C(,^(A^)) C .^(A^). Conversely, if the projections are correlated, the component 
of the PCGLS iterates may have a significant component in the null space of A. This lack of C-orthogonality is the 


















key to include a priori information about the contribution from the null space, hence invisible to the likelihood, into 
the computed solutions. 

A good measure for the level of correlation introduced by the priorconditioner may therefore be the distance from 
C-orthogonality of the null space of A and the range of its transpose. This distance can be quantified, e.g., in terms 
of the C-inner product as 


C) = inf I I X G aK(A), y G M{k^)\ . 

Ukllcllyllc J 

Observe that the subspace decompositions above show that the C-orthogonal projections of x on oT^(A) and spanjX"} 
are always uncorrelated; see also |[25l for discussion of oblique projections in the context of standard transforma¬ 
tions. 


2.3 The Lanczos process 

The convergence rate of the CGLS algorithm is related to the spectral properties of the matrix, and consequently, it is 
of interest to understand how preconditioning changes the spectral properties and the order in which eigendirections 
are included in the computed solution by the CGLS method. We consider the Lanczos process implicitly defined by 
fhe Krylov subspace iferafions. If is known jlTl fhaf fhe firsf k residual vecfors compufed by CGLS and normalized fo 
have uniflengfh form an orfhonormal basis for fhe Krylov subspace/Cfc(A^6,A^A). Denofe fhemby xq, xi,... 
and collecfed fhem info fhe mafrix V^. If 

Xj — ^j—i T iPj—1 

is fhe CGLS updafing formula which compufes fhe yfh iferafe Xj from fhe (j — l)sf by adding a correction along fhe 
search direction Pj-i, and 

Pj = G + 

is fhe formula fo updafe fhe search direcfion wifh rj = b — A^Axj , if is sfraighfforward to show that 

A^AVk = VkJk - 

ak-i 

In the last formula, is the kth column of the k x k identity matrix and T^ is the tridiagonal matrix 

Tfc = L-^A-^U, 

where = diag{ao, • • • U = ^k = diag{||ro, ||,. .., ||rfc_i||}, and is the k x k upper 

bidiagonal matrix with ones on the main diagonal and —/3o,..., —fik-i on the superdiagonal. 

It follows from the orthogonality of the vi that 


yl{A^A)Vk = Jk, 

therefore T^ is the projection of AJA onto the Krylov subspace /Cfc(A^6, A^A). One can show that the kth CGLS 
iterate can be expressed as 

^k '^kUkt 
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where yk solves the k x k linear system 

Tfc?/ = ||ro||ei. 

Before proceeding with our analysis, we recall the following result about the rate of convergence of the conjugate 
gradient method. 


Theorem 2.3 If the matrix A is symmetric positive definite, and r/k = — Xk is the error in the kth iterate of the 

conjugate gradient method, then 

ll>ftllA < 2 (^1^) IIxoIIa, 

where n = A^/Ai is the condition number of fK. 


Replacing the matrix A with A^ A, and noting that because of the termination criterion effectively \n can be replaced 
by the smallest nonzero eigenvalue of A^A, we can rewrite the theorem above as follows. 


Theorem 2.4 If Xk is the kth iterate computed with the CGLS method and the initial residual A A, it follows that 

k 


InW < 2 


K — 1 


y/H + 1 


where k is the condition number o/A^A. 


Exploiting the connection between the Lanczos process, orthogonal polynomials and Gauss quadrature rules, we 
have the following characterization of the norm of the residual for the normal equations solved with the CGLS 
method in terms of the eigenvalues of the matrix A^A and of their approximation with the corresponding Ritz value, 
which are the eigenvalues of the tridiagonal matrix T^. This is an adaptation of a result in ifTTll (page 203). 


Theorem 2.5 Let Aj G M denote the ith eigenvalue of A^A, and qi G be the corresponding eigenvector, 1 < 
(k) 

i < n. Further, let 9^ ' be the jth eigenvalue of the tridiagonal matrix T^, 1 < j < A: < r, where r is the rank of 
the matrix A. For all k, 1 < k < r, there exists a number ^k, Ai < < Xr such that the norm of the kth residual 

satisfies 


c2fc+l 

i=l 


n(^ 

f=i 


-of)' 




In particular, the rate of convergence of CGLS depends on how accurately the eigenvalues A* are approximated by the 

(k) 

Ritz values 9 - ’, and in particular those eigenvalues corresponding to eigendirections with a significant component 
of the initial residual tq: As soon as a Ritz value has converged to A*, the contribution to the residual from the 
corresponding eigendirection vanishes. In turn, the convergence rate of the CGLS method depends on how accurately 
the eigenvalues of projected tridiagonal matrix approximate those of A^A, see, e.g., ITO . hence it is reasonable to 
assume that the kth iterate is richer along those directions that approximate those eigenvectors of A^A where the 
components of rg are larger. 

In general, if the vector ro is the linear combination of a few eigendirections of A^ A, in exact arithmetic it will require 
only a few iterations of the CGLS methods to meet the termination criterion, and the solution vectors will live in a 
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low dimensional Krylov subspace methods. Conversely, the richer in eigendirections ro is, the more iteration will 
take the CGSL method to satisfy the stopping rule, and the larger the dimension of Krylov subspace of the solution 
will be. 

It has been observed experimentally that the introduction of a prior conditioner in the CGLS method changes the 
spectral properties of the underlying normal equations in two ways. In fact, the projections of the initial residual 
vector along the eigenvectors of A^A associated with nonzero eigenvalues are more even distributed, and the range 
of the nonzero eigenvalues widens, sometimes significantly. In view of these observations and of Theorem |2. 5 [ we 
expect the convergence rate to slow down, because a larger number of eigenvalues of A^A must be approximated 
accurately by the Ritz values, in order to reduce the norm of the discrepancy below the assigned threshold. A 
consequence of this is that a larger number of eigendirections contribute to the computed solution, with the result 
of providing a richer approximation of the underlying signal. We illustrate this in the next section with computed 
examples. 


3 Computed examples 

In this section, we consider two linear inverse problems to elucidate the analysis of the PCGLS algorithm. The 
purpose of the first simple one-dimensional test model is to follow step by step the changes in the Krylov subspaces 
and related quantities induced by the introduction of a prior conditioner, while the second example is a more realistic 
problem inspired by the medical imaging applications that motivated the present analysis. 


3.1 Example 1: One-dimensional deconvolution 


Let / : [0,1] —)• M be a piecewise continuous function, /(O) = 0, and assume that the observations consist of few 
noisy convolutions by an Airy kernel function 


a{t) = 




where k > 0 is a parameter regulating the width of the kernel. To discretize the problem, we subdivide the interval 
[0,1] by introducing n equidistant points si,... ,Sn and approximate the convolution integral by a finite sum of the 
form 

^ n 

git) = ait-s)fis)d,s KiSk)fisk), I < j < n, 

^ k=i 

where Sj = j/n. We assume that the data consist of a m discrete noisy measurements of p at fi,..., tm, that is 


be = 9iti)+S£, l<i<m, 


where m <C n. If we denote by A the matrix whose entries are 


Q.jf Sfc); 

n 


and by x the vector with components 


/ ('Sfc)) 


1 < j < m, 1 < k < n, 


1 < k < n, 
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Figure 1: The six basis vectors that span ^(A^) (dashed curve), and the vectors that span C(,^(A^)), respectively 
(solid curve). The values of tj corresponding to the data are indicated by the vertical red lines. 


we can formulate the problem as the linear system ([^. 

In this computed example, we set n = 150 and m = 6, thus the linear system is strongly underdetermined. We 
quantify the index of under determinacy by the ratio of unknowns to equation, which in this case is n/m = 25. 
The width parameter in the simulations is k = 0.02. We solve this linear system iteratively starting with initial 
approximate solution xq = 0 in two different ways: first we use the plain CGLS method with a termination criterion 
based on the discrepancy principle, then we solve it with the priorconditioned CGLS method. The prior conditioner 
corresponds to a second order Gaussian prior, and is defined through the factorization of its precision matrix. 


C-i = L'^L, L = /3 


a 

-1 2 -1 


-1 2 -1 
a 


where the value of the scalar a > 0 is selected to guarantee uniform variance for all pixels (see lHU). 

The six basis functions spanning the subspace ,yL(A)-*- = used by the plain CGLS to compute the iterates, 

and the six vectors spanning C(^(A)-*-) = C(^(A^)) are show in Figure As expected, the basis functions 
reflect well the expected behavior of the underlying function as postulated by the chosen prior. 


To demonstrate the effect of the priorconditioning on the CGLS algorithm, we generate noisy data assuming that the 
underlying signal is a smooth sigmoid function and the additive noise in the data is scaled zero mean white Gaussian, 
e ~ AA(0, The noise level is chosen very low, a = 5 x 10“^, in order to allow the algorithm to compute 

more iterates; with higher noise level, the stopping criterion is satisfied after one or two iterations. Figure shows 
the sequence of CGLS and PCGLS iterates. In this example the standard CGLS converges very quickly, leading to 
a termination after three steps, while the PCGLS requires more iterations. Moreover, since the CGLS cannot inform 
the solution about the component of the solution in the null space of A, it returns a very poor reconstruction in the 
intervals between the observation points, as shown in the left plot of Figure]^ The PCGLS algorithm, on the other 
hand, is able to extract meaningful information about the solution from the null space of A guided by the data and 
the prior, and returns a better solution. 
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Figure 2: The progress of the iterations based on plain CGLS (left) and the priorconditioned CGLS (right). The 
final result when iterations are stopped at discrepancy is the boundary shape of the surface. The true profile that was 
used to generate the data is shown in red. 
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Iteration 

Figure 3: The norm of the projection of the current iterate onto the null space divided by the norm of the current 
iterate, plotted as a function of the iteration number. The plot indicates that more than 60% of the priorconditioned 
iterates is in the null space, while without the priorconditioning, the null space component vanishes. 



An important role of the priorconditioner is to unlock important information about the underlying solution hidden 
in the null space by producing iterates not necessarily orthogonal to null space of A. To quantify the distance from 
C-orthogonality of the span of X" and the null space of A introduced by the priorconditioner at each iteration step, 
we compute the relative size of the orthogonal component of the PCGLS iterates Xk on the null space of A. 




llPa^fcll 

Pfcll 


P : 



^(A). 


The plot of these components as a function of the iteration number, shown in Figure]^ indicates that in this example 
more than 60% of the priorconditioned solution is in the null space, while without the priorconditioning, the null 
space component vanishes. 

To illustrate how this null-space unlocking prior conditioner changes the eigenvalues of the least squares problem. 
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Figure 4: Left: The eigenvalues of the tridiagonal matrix G as a function of the iteration number k. The 
non-zero eigenvalues of the matrix A^A are indicated by horizontal lines. Observe that two of the eigenvalues are 
nearly equal. Right: The corresponding eigenvalues using the priorconditioned scheme. To help comparing the two 
cases, the spectral interval of the matrix without priorconditioning is indicated as a shaded band in the plot. 


we compute the spectra of the projected matrices 

j. = vj(A'^A)V j, fj = vj(A'^A)V j 1 < j < m, 

and the projections of the respective initial residuals onto the eigenvectors associated with the non-zero eigenvalues. 
Figure shows the non-zero eigenvalues and the progression of their approximation with the eigenvalues of the 
Lanczos tridiagonal matrices as j increases. 


Table 1: Absolute values of he projections of the initial residual on the eigenvectors of A^ A (left) and A^ A (right). In 
agreement with Figure]^ the order in which the first four Ritz values (ORV) converge to the eigenvalues correspond 
the order of the largest projections. 


Plain CGLS 

ORV 

Preconditioned CGLS 

ORV 

0.0196 

(3) 

8.2586 

(1) 

2.9481 

(1) 

4.3337 

(2) 

0.0329 


0.0759 

(4) 

0.0232 

(4) 

0.1214 

(3) 

0.4073 

(2) 

0.0031 


0.0004 


0.0044 



Since the convergence rate of the CGLS method depends on the condition number of the coefficient matrix, on the 
basis of the above observation that the width of the interval defined by the nonzero eigenvalues of A^A is smaller 
than the corresponding interval for the priorconditioned problem, we expect that the plain CGLS method satisfies 
the stopping criterion in fewer iterations than required by the PCGLS method. Figure [^confirms that this is indeed 
the case: the plain CGLS method stops after three iterations, while six iterations are required for the PCGLS method 
to terminate. 
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Figure 5: The residual of the iterative solutions as a function of the iteration number. The red curve with square 
markers corresponds to the plain CGLS, while the preconditioned residuals are marked by blue dots. The noise level 
used in the stopping rule is indicated by the horizontal line. 


3.2 Example 2: Computerized tomography with few radiographs 

In the second computed example, we consider the problem of estimating a two-dimensional density map in a bounded 
domain from noisy observation of integrals over a sparse set of lines across the domain. For the sake of definiteness, 
let n = [—1/2,1/2] X [—1/2,1/2] represent the domain of interest, and let p : If — )• M denote a piecewise continuous 
non-negative density function. For simplicity, we assume that supp(p) C B{0, 1/2), the disc centered at the origin 
and with radius 1/2. Let i{s, 9) denote a line segment crossing the domain If, 

i{s,9) = {y = (yi, 2 / 2 ) £ ^ \ yi cos 9 + y 2 sin0 = s}, |s|<l/2, O<0< 27r. 


The data consist of noisy observations of the integrals 

9{s,9)=[ p{y)dS{y)= [ p{y{t))dt, 
Je{s,e) Jo 


(17) 


where y = y{t) indicates the parametrization of the line segment with respect to the arc length t such that y (0), y (S') G 
dQ. The integrals are supported over a discrete set of lines, parametrized as {("(si, 9i),..., £{sm, 9m)}- It follows 
from the Fourier slice theorem OTI that, in theory, the knowledge of the integrals over all lines crossing Q is enough 
to determine p uniquely. In practice, if the set of lines is sparse, the problem becomes underdetermined and therefore 
ill-posed. 


To discretize the problem, we divide the square Q into n x n square pixels of equal size, denoted by Qj, I < j < 
= N. The density p is approximated by a piecewise constant function, p|^ = Xj, and the model ([IT]) for 
noiseless data is approximated by 


y ^fc) ~ ^ ^ 9j^) n rij I Xj , 

i=i 

where 0fc) H Q-j \ denotes the length of the intersection of the line £{sk, 9k) and the jth pixel. Assuming additive 

noise e G M™, the model leads to the linear observation equation h = kx + e with obvious notations. 
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Figure 6: The original image of size 160 x 160 pixels (left) and the sinogram data (right), plotted as an image of 
size 60 X 20. The horizontal axis eorresponds to the 20 different illumination angles. 


We eonsider a family of priors of Whittle-Matern type, defined as follows: Let Ad denote the Dirichlet Laplaeian 
over Q, that is, 

&{AD) = {ueH\n)\u\g^ = 0}. 

We define the preeision operator, the inverse of the covariance operator, through the formula 

^ = —Ad + 

where the parameter A > 0 is a correlation length. This class of priors has been discussed extensively in recent 
articles ll^ [371, and it has been shown that their discrete approximations are, in a certain sense, discretization 
invariant. 

To discretize the prior, we use a finite difference approximation of the Laplaeian, writing the precision matrix 
K E as 

K = -U®D-D®U + ^U, 

where D E is the three-point finite difference approximation of the one-dimensional Laplaeian with Dirichlet 
boundary conditions, 

‘ -2 1 


1 -2 

and (g) denotes the Kronecker product of matrices. To generate the data, we consider the gray scale image of size 
160 X 160 shown in Figure]^ and illuminate it from = 20 illumination angles that are evenly distributed over the 
interval [—7r/2,7r/2), 9j = —TT(2+j7r(ng, 0 < j <ng — l. For each illumination direction, Ug = 60 parallel beams 
are chosen, corresponding to values Sk = —1/2 + k/{ns — 1), 0 < /c < n* — 1, yielding to a severely undersampling 
of the sinogram data. The forward matrix obtained in this manner is of size 1 200 x 25 600, hence the index of under 
determinacy is n/m = 21.3. The noiseless undersampled sinogram data is shown in Figure]^ 
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Figure 7: A single column of the matrix representing a basis vector of the subspace ^(A^) represented as 
a 160 X 160 image (top left), and the same vector after a multiplication of the covariance matrix C = K“^, with 
different values of the correlation length parameter A. Measured in units of pixels, the correlation length is chosen 
as A = 2,4 pixels (top row), and A = 8,16 and 32 pixels (bottom row). 

As in the previous example, we start by demonstrating the effect of priorconditioning on the basis vectors of .^(A^). 
In the present example, the column vectors of the matrix A^ allow an interpretation as an image of size 160 x 160. 
Therefore, in Figure we show a column of the matrix, in image form, as well as the same vector multiplied 
by covariance matrices C corresponding to different correlation lengths A. As expected, the plain column vector 
represents a beam traversing the image area, while the multiplication with the covariance matrix creates blurring 
that increases with the correlation length. Observe that any image that is supported on pixels that have an empty 
intersection with all the beams defining the data are orthogonal to the columns of A^, thus in the null space of A. 
Consequently, we expect that the plain CGLS algorithm produces an image that has a strong geometric artifact since 
the pixels with empty intersection with the beams remain dark. The application of the covariance operator on the 
basis vectors, on the other hand, increases the width of the beams, hence the preconditioned CGLS solution does not 
display this artifact, producing a smoother solution. This is indeed the case, as illustrated in Figure where the two 
solutions produced by the CGLS method with and without prior conditioner are displayed side by side. 

As in the previous example, the enrichment of the Krylov subspaces by vectors that lie in null space of the matrix A 
slows down the convergence of method. In the left panel of Figure]^ we have plotted the discrepancy as a function 
of the iteration round both for plain CGLS and the prior conditioned version. Moreover, we plot the eigenvalues of 
the tridiagonal matrix T^ approximating the eigenvalues of A^A, as well as the corresponding approximations for 
the prior conditioned version of the algorithm. The eigenvalue approximations indicate that the eigenvalues of the 
prior conditioned matrix are significantly more spread out than without prior conditioning. Numerical experiments 
indicate that by increasing the correlation length parameter A and thus widening the beams sounding the unknown 
density, the null space of A has a stronger role in the reconstructions and more iterations are needed for convergence 
to discrepancy. 
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Figure 8: The CGLS solutions with no priorconditioning (left) and with priorconditioning with correlation length 
A = 4 pixels. 
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Figure 9: Left: The discrepancy as a function of the iteration round, the plain CGLS corresponding to the red 
squares, and the prior conditioned CGLS to the blue dots. Right: The spectrum of the tridiagonal matrix T^ as a 
function of iteration round. The red squares correspond to the plain CGLS, the blue dots to the priorconditioned 
CGLS. 


18 
















Figure 10: The similarity maps between the original image and the images obtained by CGLS with no priorcondi¬ 
tioning (left) and with priorconditioning (right). Light colors denote the regions where the similarity is higher. 


To evaluate the quality of the reconstructed images we use the Structural Similarity (SSIM) index introduced in ll^ . 
which measures the structural differences between two images. The SSIM index between the original image Iq and 
the reconstructed image Irec is defined as 


SSIMiloJrec) = 


(2 flQ fJ>rec Tl) (^ ^o,rec + 72) 

(/^o + /^rec + 7l)(c^o + ^^rec + I2) 


where /Tq (firec) and Uo {(Tree) are the mean intensity and the standard deviation of the original (reconstructed) image, 
respectively, and ao,rec is the correlation between and Irec- We notice that the SSIM index is always < 1, being 
equal to 1 when the images to be compared are identical. 

The two constants 71, 72 are added to avoid instability when the means /io and Hrec or the standard deviation do and 
arec are small. Typical values are 71 = 0.01 L and 72 = 0.03 L, where L is the dynamic range of the images, i.e., 
the ratio between the largest and smallest values of the pixel intensities. 

In Figure [^the similarity maps between the original image (Figure]^ left) and the reconstructed images (Figure 
are shown. The maps are obtained by evaluating the local SIMM index in a number of local Gaussian windows 
(see for details). The higher quality of the reconstruction obtained by the solution with preconditioning is 
emphasized by values of the SSIM index close to 1 in the region of interest. Compressing the similarity in a single 
indicator, the mean SSIM index is 0.120 when the solution is obtained by the CGLS without priorconditioning, while 
its value is 0.409 when the priorconditioning is used. 


Acknowledgements 


This work was completed during the visit of DC and ES at University of Rome “La Sapienza” (Visiting Re¬ 
search/Professor Grant). The hospitality of the host university is kindly acknowledged. The work of ES was 
partly supported by NSE, Grant 1312424. This work was partially supported by grants from the Simons Eoundation 
(#305322 and # 246665 to Daniela Calvetti). 


19 











References 


[1] Arridge SR, Betcke MM and Harhanen L (2014). Iterated preeonditioned LSQR method for inverse problems 
on unstruetured grids. Inverse Problems 30 doi:10.1088/0266-5611/30/7/075009. 

[2] Baillet S, Mosher JC and Leahy RM (2001). Eleetromagnetie brain mapping. Signal Proeessing Magazine, 
IEEE, 18 14-30. 

[3] Bjdrek, A. (1996). Numerical methods for least squares problems. SIAM, Philadelphia. 

[4] Calvetti D (2007) Preeonditioned iterative methods for linear diserete ill-posed problems from a Bayesian 
inversion perspeetive, J. Comp. Appl. Math. 2 378-395. 

[5] Calvetti D, MeGivney D and Somersalo E (2012) Eeft and right preeonditioning for eleetrieal impedanee 
tomography with struetural information. Inverse Problems 28 055015. 

[6] Calvetti D, Morigi M, Reiehel E and Sgallari E (2000) Tikhonov regularization and the E-eurve for large, 
diserete ill-posed problems, J. Comput. Appl. Math. 123 423^46. 

[7] Calvetti D, Reiehel E and Shuibi A (2003) E-eurve and eurvature bounds for Tikhonov regularization, Numer. 
Algorithms 35 301-314. 

[8] Calvetti D, Reiehel E and Shuibi A (2003) Enriehed Krylov subspaee methods for ill-posed problems, Einear 
Algebra Appl. 362 257-273. 

[9] Calvetti D and Reiehel E (2003) Tikhonov regularization of large seale problems, BIT 43 263-283. 

[10] Calvetti D, Reiehel E and Shuibi A (2005) Invertible smoothing preeonditioners for linear diserete ill-posed 
problems, Appl. Numer. Math. 54 135-149. 

[11] Calvetti D and Somersalo E (2005) Prioreonditioners for linear systems. Inverse Problems 21 1397-1418. 

[12] Calvetti D and Somersalo E (2007) Introduction to Bayesian Scientific Computing — Ten Lectures on Subjective 
Computing. Springer Verlag. 

[13] Cheney M, Isaaeson D and Newell JC (1999) Eleetrieal Impedanee Tomography. SIAM Rev 41 85-101. 

[14] Choi SC and Saunders M (2014) Algorithm 937: MINRES-QEP for symmetrie and Hermitian linear equations 
and least-squares problems. ACM ACM Trans. Math. Software (TOMS) doi^ 10.1145/2527267. 

[15] Dykes E and Reiehel E (2014) Simplifying GSVD eomputations for the solution of linear diserete ill-posed 
problems . J. Comp. Appl. Math. 255 17-27. 

[16] Elden E (1982) A weighted pseudo inverse, generalized singular values, and eonstrained least squares prob¬ 
lems. BIT 22 487-502. 

[17] Golub GH and Meurant G (2009) Matriees, moments and quadrature with applieations. Prineeton University 
Press. 

[18] Golub GH and Van Eoan CP (2012) Matrix computations (Vol. 3). JHU Press. 


20 



[19] Hanke M (1995) Conjugate Gradient Type Methods for Ill-Posed Problems Pitman Researeh Notes in Mathe- 
maties. Longman, Harlow, UK. 

[20] Hanke M and Hansen PC (1993) Regularization methods for large-seale problems, Surv. Math. Ind., 3:253n315. 

[21] Hansen PC (1989) Regularization, GSVD and truneated GSVD. BIT 29 491-504. 

[22] Hansen PC (1998) Rank-Defieient and Diserete Ill-Posed Problems: Numerieal Aspeets of Linear Inversion, 
SIAM, Philadelphia. 

[23] Hansen PC (2010) Discrete inverse problems: insight and algorithms. SIAM, Philadelphia. 

[24] Hanke M, Nagy J and Plemmons R (1993) Preeonditioned iterative regularization for ill-posed problems, in 
Numerieal Linear Algebra and Seientifie Computing, ed. by Reiehel L, Ruttan A, Varga RS. de Gruyter, Berlin, 
Germany: 141-16. 

[25] Hansen PC (2013) Oblique projeetions and standard-form transformations for diserete inverse problems. Nu- 
mer. Linear Algebra Appl. 20 250-258. 

[26] Hestenes MR and Stiefel E (1952) Methods of eonjugate gradients for solving linear systems (Vol. 49, pp. 
409^36). Washington, DC: National Bureau of Standards. 

[27] Homa L, Calvetti D, Hoover A and Somersalo E (2013) Bayesian Preeonditioned CGES for Souree Separation 
in MEG Time Series. SIAM J Sei Comp 35: B778-B798. 

[28] Kaipio J P and Somersalo E (2004) Computational and Statistical Inverse Problems (New York: Springer- 
Verlag) 

[29] MeGivney D (2013) Statistical preconditioners and quantitative imaging in electrical impedance tomography, 
PhD Thesis, Case Western Reserve University. 

[30] MeGivney D, Calvetti D and Somersalo E (2012) Quantitative imaging with eleetrieal impedanee speetroseopy. 
Phys. Med. Biol. 57 72-89. 

[31] Natterer E (1986) The Mathematics of Computerized Tomography. Wiley, New York. 

[32] Paige CC and Saunders MA (1981) Towards a generalized singular value deeomposition. SIAM J Numer Anal 
18 398-405. 

[33] Paige CC and Saunders MA (1982) ESQR: An algorithm for sparse linear equations and sparse least squares. 
ACM Transaetions on Mathematieal Software (TOMS), 8, 43-71. 

[34] Pursiainen S and Kaasalainen M (2013) Iterative alternating sequential (IAS) method for radio tomography of 
asteroids in 3D. Planetary and Spaee Seienee, 82 84-98. 

[35] Pursiainen S and Kaasalainen M (2014) Sparse souree travel-time tomography of a laboratory target: aeeuraey 
and robustness of anomaly deteetion. Inverse Problems 30 114016. 

[36] Roininen E, Eehtinen M, Easanen S, Orispaa M and Markkanen M (2011) Correlation priors. Inverse Problems 
and Imaging 5 167-184. 


21 



[37] Roininen L, Huttunen J and Lasanen S (2014) Whittle-Matern priors for Bayesian statistical inversion with 
applications in electrical impedance tomography. Inverse problems and imaging 8 561-586. 

[38] Saad Y (2003) Iterative methods for sparse linear systems. SIAM, Philadelphia. 

[39] van der Vorst HA (2003) Iterative Krylov Methods for Large Linear systems. Cambridge University Press, 
Cambridge. 

[40] van der Sluis A and van der Vorst HA (1986) The rate of convergence of conjugate gradients. Numer. Math. 48 
543-560. 

[41] Stuart AM (2010) Inverse problems: a Bayesian perspective. Acta Numerica 19 451-559. 

[42] Tarantola A (1987) Inverse Problem Theory. Elsevier, 1987. Reprinted in 2004 by SIAM, Philadelphia. 

[43] Wang Z, Bovik AC, Sheikh HR and Simoncelli EP (2004) Image quality assessment: from error measurement 
to structural similarity. IEEE Trans. Image Processing 13 600-612. 

[44] Van Eoan CP (1976) Generalizing the singular value decomposition. SIAM J Numer Anal 13 76-83. 


22 



